chore: Handle empty tokenization in perplexity and allow ad-hoc Advan… #802

arnold-jr · 2024-08-29T16:23:49Z

…cedAttackMetric

What does this PR do?

Handles empty input_ids when calculating complexity, which gave rise to RuntimeError when using "gpt2" as tokenizer.
Allows for adding metrics to AdvancedAttackMetric with configurable init params.

Summary

This PR cleans up some confusing errors related to perplexity calculation when handling empty inputs. Instead of a RuntimeError, the perplexity calc now throws a ValueError if empty inputs are encountered. It also adds some regression
tests around this error, and makes it possible to pass a different tokenizer to the perplexity calculation in AdvancedAttackMetric.

Additions

Added handling of empty tokenization during Perplexity calculation.
Added tests to test_metric_api to reproduce perplexity errors on empty inputs using "gpt2"
Added method for appending an ad-hoc metric to AdvancedAttackMetric.

Changes

Perplexity with "gpt2" no longer throws torch RuntimeError on empty input_ids

Deletions

Checklist

[x ] The title of your pull request should be a summary of its contribution.
[ x] Please write detailed description of what parts have been newly added and what parts have been modified. Please also explain why certain changes were made.
[ x] If your pull request addresses an issue, please mention the issue number in the pull request description to make sure they are linked (and people consulting the issue know you are working on it)
[ x] To indicate a work in progress please mark it as a draft on Github.
[ x] Make sure existing tests pass.
[ x] Add relevant tests. No quality testing = no merge.
[ x] All public methods must have informative docstrings that work nicely with sphinx. For new modules/files, please add/modify the appropriate .rst file in TextAttack/docs/apidoc.'

…cedAttackMetric

chore: Handle empty tokenization in perplexity and allow ad-hoc Advan…

f315e8d

…cedAttackMetric

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Handle empty tokenization in perplexity and allow ad-hoc Advan… #802

chore: Handle empty tokenization in perplexity and allow ad-hoc Advan… #802

arnold-jr commented Aug 29, 2024

chore: Handle empty tokenization in perplexity and allow ad-hoc Advan… #802

Are you sure you want to change the base?

chore: Handle empty tokenization in perplexity and allow ad-hoc Advan… #802

Conversation

arnold-jr commented Aug 29, 2024

What does this PR do?

Summary

Additions

Changes

Deletions

Checklist